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ABSTRACT 

We explore the structures of protoclusters and their relationship with high redshift 
clusters using the Millennium Simulation combined with a semi-analytic model. We 
find that protoclusters are very extended, with 90 per cent of their mass spread across 
^ 35/i“^Mpc comoving at z = 2 SOarcmin). The ‘main halo’, which can manifest 
as a high redshift cluster or group, is only a minor feature of the protocluster, con¬ 
taining less than 20 per cent of all protocluster galaxies at z = 2. Furthermore, many 
protoclusters do not contain a main halo that is massive enough to be identified as 
a high redshift cluster. Protoclusters exist in a range of evolutionary states at high 
redshift, independent of the mass they will evolve to at z = 0. We show that the 
evolutionary state of a protocluster can be approximated by the mass ratio of the 
first and second most massive haloes within the protocluster, and the z = 0 mass of 
a protocluster can be estimated to within 0.2 dex accuracy if both the mass of the 
main halo and the evolutionary state is known. We also investigate the biases intro¬ 
duced by only observing star-forming protocluster members within small fields. The 
star formation rate required for line-emitting galaxies to be detected is typically high, 
which leads to the artificial loss of low mass galaxies from the protocluster sample. 
This effect is stronger for observations of the centre of the protocluster, where the 
quenched galaxy fraction is higher. This loss of low mass galaxies, relative to the field, 
distorts the size of the galaxy overdensity, which in turn can contribute to errors in 
predicting the z = 0 evolved mass. 

Key words: methods: numerical - methods: statistical - galaxies: clusters: general 
- galaxies: formation - galaxies: evolution - cosmology: theory 


1 INTRODUCTION 

In a cold dark matter universe with a cosmological constant 
(ACDM), structure forms through hierarchical growth with 
smaller haloes merging to form larger ones. Galaxy clusters 
in the present day Universe are the most massive structures 
to have formed and were the result of the merging of many 
smaller haloes. Clusters, typically, are virialised dark matter 
haloes of mass greater than Mq containing a hot X-ray 
Intra-Cluster Medium (ICM) and red, passive galaxies. 

At higher redshift, z > 1.5, most clusters were not 
the massive virialised haloes that we see today. Instead we 
see their progenitors, a diffuse collection of haloes that will 
merge to make the final halo. The term ‘protocluster’ is often 
used to describe this state, but differing definitions of what 
a protocluster is exist in the literature. While some define 
a protocluster as all the haloes at a given redshift that will 
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merge to make the final cluster, others define it as being just 
the most massive progenitor halo, sometimes referred to as 
the main halo. While using the latter definition dramatically 
reduces the observational expense, it risks missing galaxies 
undergoing environmental preprocessing and only captures 
part of what is going on in the forming cluster. 


Several high redshift galaxy clusters have now been 
detected through X-ray emission, the Sunyaev-Zel’dovich 
(SZ) effect, as well as through photometric redshift hunts in 


large deep surveys ( 

Gobat et al.l 2011: 

Stanford et al. 20121: 

Zeimann et al.ll2012 

: Fassbender et al.l 

2014: Andreon et al.l 

201^. The properties of the ICM and galaxies indicate that 


these structures are already collapsed, i.e. these objects are 
single collapsed main haloes. However, a great deal of cluster 
grow th occurs at relatively late times (z < 1; IChiang et al.l 
l2013ll . and many of the galaxies and dark matter that end 
up in the z = 0 cluster, will not be located in the main 
halo of the protocluster at high redshift. In this paper we 
investigate how much of the matter and galaxies reside in 
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the main halo compared to the entire protocluster as a func¬ 
tion of redshift, and use this to investigate its signihcance. 
Additionally we look at whether protoclusters with evolved 
main haloes are representative of all protoclusters, or are a 
subsample that are easier to detect. 

Identifying protoclusters has so far been challenging 
due to their low number density and the faintness of distant 
galaxies. One of the most successful methods for detecting 
protoclusters is to use High Redshift Radio Galaxies 
(H^ RGs) as a tracer population to locate overdense regions 


20031: 

Venemans et al.l 

2007: iGalametz et al.l 2010l: 

Hatch et al. 2011a 

Galametz et al.l l2013l: IWvlezalek et al.l 

20131: 

Gooke et al. 

I 2 OI 4 II. 

These galaxies are among the 


most massive galaxies at all epochs (ISevmour et al.l[2007h . 
but the large galaxy overdensities that surround these 
radio-lou d galaxies exceed t hat of si milar mass radio-quiet 
galaxi es (lHatch_^£_aJj_ 2014 h Both I Ramos Almeida et al.l 
12013h and iHatch et al.l (|2ni4l ~l concluded that dense envi¬ 
ronments foster the formation of radio-loud jets from AGN, 
which explains why H 2 ;RGs are excellent beacons of galaxy 
protoclusters and high redshift clusters. 

An alternative technique is to identify protoclusters 
in large surveys with accurate photometric redshifts. Pre¬ 
cise photometric redshifts at a > 2 are difficult to ob¬ 
tain due to the Balmer break shiftin g into the near-infrared 
wavelength, but lSnitler et ah] ll2012h has shown that Virgo- 
like cluster progenitors c an be found if mediu m-band near- 
infrared filters are used. iGhiang et al.l (l2013l l advocates a 
protocluster detection method that finds galaxy overden¬ 
sities in 15 Mpc comoving windows; applying this method 


to th e 1.62deg^ GOSMOS/UltraVISTA field ( Muzzin et al 


2013) has resulted in 36 candidate structures (IGhiang et al 


2014h . This method is effective because of the correlation 


between aperture de nsity and halo mass (iHaas et al.ll20l3 : 
iMuldrew et al.ir2012l ~l. however at high redshift there is con- 
siderable uncertainty in this relation due to projection effects 
dShattow et al.l[201(^ . 

A number of techniques have been used to isolate the 
galaxies within high redshift clusters and protoclusters for 
further study. Photometric redshifts from deep multi-band 
data can reach accuracies of /S.z/{l + z) = 0.03, and although 
(proto-)cluster g alaxies have been se lected using photomet¬ 
ric redshifts le.g. lTanaka et aDl2010l l. the sample is often in¬ 
complete and greatly contaminated by foreground and back¬ 
ground galaxies. One of the most successful techniques for lo¬ 
cating clean samples of protocluster galaxies is using narrow- 
band filters with specific central wavelengths matched to 
the wa velength of an emission line from protocluster galax¬ 
ies (e.g . Rurl^_et_^ 200j: l^nemans et alJl2007l : lHatch et al.l 


l2011b : [Gooke et al. 2014h . The ideal line is Ha since it is a 
strong line which is least affected by dust absorption. Select¬ 
ing galaxies based on their line emission means only active 
galaxies are located, i.e. star-forming galaxies and AGN. 
This can limit our view of the protocluster in unexpected 
ways. Here we explore how this selection method can give a 
biased view of the protocluster. 

In this paper we explore the galaxies that make up pro¬ 
toclusters using a semi-analytic model built upon the Mil¬ 
lennium Simulation. In Section[2]we describe the simulations 
used and how we constructed the protocluster catalogue. Us¬ 
ing this mock catalogue, in Section [3] we give an overview 


of the spatial properties of protoclusters and their member 
galaxies. We then examine two fundamental issues concern¬ 
ing protoclusters: the relationship of the main progenitor 
halo (which is sometimes observed as the high redshift clus¬ 
ter) to the rest of the protocluster and the cluster’s z = 0 
mass; and how our understanding of protoclusters is biased 
when only active protocluster galaxies are observed. In Sec¬ 
tion |4] we summarise our findings and reflect on the implica¬ 
tions they have for interpreting observations of protoclusters 
and high redshift clusters. 


2 METHODS 

To construct a statistically large sample of galaxy clus¬ 
ters, whose evol ution can be tra cked back to high red¬ 
shift, we used the lGuo et al.l (1201 il l se mi-analytic model ap - 
plied to the Millennium Simulation (ISpringel et al.l [20051 1. 
Glusters were identified as haloes with masses greater than 
10^’* h~^MQ at 2 = 0, while protoclusters were defined as 
the cluster progenitors. 


2.1 The Millennium Simulation and 
Semi-Analytic Model 

The Millennium Simulation follows the evolution of 2160^ 
dark matter particles in a cube of comoving sid e length 
500 /t ~^Mpc, using the A-body code GADGET-2 (ISpringell 
I 2 OO 5 II . It adopts a ACDM cosmology with parameters flo = 
0.25, Ha = 0.75, h — 0.73, n = 1 and ag = 0.9 consis¬ 
tent with the Two-Degree F ield Galaxy Redshift Survey 
(2dFGRS; IColless et al.fl200lll and the first-year Wilkinson 
Micr owave Anisotropy Probe data ( lTMAP-l: ISpergel et al.l 

I2OO3II . 

Haloes were detected using a two-step procedure. 
First ly, a Friends-of-Friends algorithm (FoF; IPavis et al.l 
Il985ll with linking length, b = 0.2, was used to identify 
haloes and these we re then post-processed using subfind 
dSpringel et al.llioOll l. All haloes with greater than 20 parti¬ 
cles were used to construct merger tree s. We note similar re¬ 
sults are found wit h other halo finders dMuldrew et al.l[201ll : 
iKnebe et al.ll20lil ~l. 

T o populate the simulation with galaxies, the lGuo et al.l 
d201lll semi-analytic model was applied to the result¬ 
ing merger trees. This model is an update d vers ion of 
that previously pr e sented in iGroton et al.l d2006l l and 
iDe Lucia &: Blaizod d2007l l and gives a better fit to the 
redshift evolution in the galaxy stellar mass function. The 
model includes prescriptions for gas infall, shock heating, 
cooling, star formation, stellar evolution, supernova feed¬ 
back, black hole growth and feedback, metal enrichment, 
mergers, and tidal and ram-pressure stripping. Full details 
of these implementations can be found within the previously 
referenced papers. For the purpose of this study we cut the 
semi-analytic catalogue to only include galaxies with stellar 
masses greater t han 10^ T his is above the resolution 

limit adopted bv iGuo et al.l d201ll l. but is still below the de- 
tecti on threshold of mo st observational protocluster studies 
(e.g. ICooke et al.ll20f3j . All results that are dependent on 
stellar mass in this paper are presented against mass or with 
different minimum cuts to illustrate the effect of having a 
minimum galaxy mass cut. 


© 2015 RAS, MNRAS 000, 1-14 



















































































































What are Protoclusters? 3 



Figure 1. The spatial extent of protoclusters at 2 = 2 (left panel), 1 (centre panel) and 0 (right panel), with final cluster masses 
of M 2 QQ = 10^®'"^(top row), 10^^ ®(middle row) and ®(bottom r ow). Each window is 45 X 45/i“^Mpc 
comoving, which corresponds to 41 arcmin and 65 arcmin at 2 = 2 and 2 = 1 respectively ll Wrightll2006h . Black points represent a galaxy 
of stellar mass greater than 10® that will end up in the cluster while grey points represent those that will not. (Only 25 per cent 

of the background galaxies, grey points, are plotted to reduce image size.) The red circle corresponds to the 2 = 0 centre and comoving 
viral radius of the cluster. 


The cosmological parameters used for the Millen¬ 
nium Simulation were in agreement with the results of 
WMAP-1, but have beco me slightly discrepant with the lat ¬ 
est values from Planck (IPlanck Collaboration et al.l 12013 ). 
lAngulo fc White! (120101 ) proposed a method of rescaling 
dark matter simulations to different cosmologies by reas¬ 
signing the mass and positi o n of p articles and the redshift 
of the snapshot. IGuo et HI ll2013l ) applied this method to 
the Millennium Simulati on to obtain a gala xy catalogue 
for WMAP-7 cosmology ([Komatsu et aLlErilll ). They found 
that the increased matter density, Qm offsets the effect of 
a decreased linear fluctuation amplitude, as, which leads 
to very similar results for 2 < 3. This should have even 
less of an effect for Planck cos mology, where the res caling 
from WMAP-1 is not as large (|Henriaues et al.ll2014l ). Fur¬ 
ther comparison for protocluster s between WMAP- 1 and 
WMAP-7 cosmology was made bv IChiang et al.l (l2013h . who 
found little difference in results. This conhrms that using 


a simulation based on WMAP-1 cosmology will have little 
overall impact on our results. 

2.2 Protocluster Identification 

We identified galaxy clusters in the simulation exclusively on 
dark matter halo mass. All haloes with M 200 ^ 10^"* ^“^Mq 
at 2 = 0, where M 200 is the mass enclosed by a sphere whose 
density is 200 times the critical density of the Universe, were 
defined as galaxy clusters. This gave a total of 1, 938 clusters 
in our sample. All semi-analytic galaxies that are members of 
the FoF haloes are then classed as galaxy cluster members. 
Each halo consists of a ‘central’ galaxy, which is at the centre 
of the halo, and ‘satellite’ galaxies. 

For protoclusters, we trace the merger tree back in time 
to each redshift of interest. For each 2 = 0 cluster, we iden¬ 
tify all the haloes at a given redshift that will merge to form 
it and identify this as the protocluster. All galaxies that are 
associated with these haloes are then classed as protoclus- 
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Figure 2. The average radius that encloses 90 per cent of the stellar mass of a protocluster at different redshifts, for binned z = 0 cluster 
masses. The left panel represents comoving radius, centre panel the physical radius and right panel the angular projection. Error bars 
represent 1 a scatter and are offset about the middle mass bin by 5z = 0.05 for clarity. This radius is tightly correlated with the radius 
enclosing 90 per cent of the dark matter mass. 


ter members. For example, at 2 = 2 there are 1,938 pro- 
toclusters, which are the progenitors of the 2 = 0 clusters, 
but these are made up of 639, 253 individual haloes with a 
central galaxy of at least M* = 10® Unless stated 

otherwise, the protocluster studies presented in this paper 
are for 2 = 2. 


3 RESULTS 

The results are presented in three sections looking at differ¬ 
ent aspects of protoclusters. Firstly in Section [3.II we explore 
the distribution of protocluster member galaxies to explain 
what protoclusters are. In Section we examine the re¬ 
lationship between protoclusters and their main haloes. Fi¬ 
nally, in Section[3]3]we explore how limiting our observations 
to only the active subset of protocluster galaxies can affect 
our understanding of protoclusters. 

3.1 The Distribution of Protocluster Galaxies 

To begin, we explore the distribution of galaxies that make 
up a protocluster. Figure [T] displays the spatial extent of 
protocluster galaxy members, with mass M* ^ 10® 
at 2 = 2, 1 and 0 for niasses of lO^'* ® and 

10^®'“^The red circle corresponds to the comov¬ 
ing 2 = 0 virial radius of the cluster. At 2 = 2 the 
= 10^® '^ cluster (top left panel) extends to 

45/i“^Mpc comoving (15h“^Mpc physical), showing a rich 
structure of haloes and filaments. The structure is far from 
the collapsed single halo it becomes at 2 = 0 (top right 
panel). For the lower mass clusters, a similar filamentary 
distribution is visible at high redshift, but the overall spread 
is much smaller. 

Due to the limit of instrumental fields-of-view, targeted 
observational imaging studies of protoclusters have typi- 
cal windows of a few arcmin on a side (e.g. 2.5 arcm in in 
ICooke et al.ll20ll or 7 arcmin in iKovama et al.l[2013l l. For 
2 = 2, and the cosmology of the Millennium Simulation, 


this corresponds to 2.8/i“^Mpc and 7.7 h~^Mpc comoving 
respect i vely ( determined using ‘The Cosmology Calculator’; 
IWrightI |2006lb Comparing these to the full distribution of 
the protocluster, the left hand panels of Figure [T] demon¬ 
strates that in all but the lowest mass case, only a small 
area of the protocluster is being captured. In the top panel, 
for the most massive cluster, the red circle corresponds to 
the 2 = 0 virial radius of 2.16 h“'^Mpc comoving. Th i s circl e 
would enclose the smaller aperture of ICooke et al.l ll2014h . 
This means that any observations of protoclusters carried 
out in this way are not following the entire protocluster, but 
are focussed on just the growth of the central region. 

To further illustrate the large spatial extent of the pro¬ 
tocluster, we plot the radius that encloses 90 per cent of the 
stellar mass at different redshifts in Figure[2l The 90 per cent 
stellar mass radius is strongly correlated to the 90 per cent 
dark matter mass radius making it an excellent measure of 
cosmological growth. The protoclusters are binned by their 
2 = 0 mass and sizes are presented in comoving, physical 
and angular scale. We explore the difference between defin¬ 
ing cluster members using the FoF halo or virial radius in 
Appendix El 

In the comoving reference frame, protoclusters contin¬ 
ually collapse, albeit gradually above 2 = 3. At these high 
redshifts the protoclusters display a similar comoving size, 
for fixed 2 = 0 mass, indicative of the shorter amount of cos¬ 
mic time that passes compared to lower redshift and the late 
collapse of clusters. The difference in size with mass at fixed 
redshift is also much larger at high redshift compared with 
the prese nt day. The la r ge siz es are in agreement with those 
found bv IChiang et al.l (I 2 OI 3 I I using an alternative measure 
of the protocluster’s radius. 

In physical units the behaviour of the protocluster ap¬ 
pears quite different. At high redshift protoclusters are still 
expanding with the Universe before collapse occurs after 
2 = 1. Therefore the global density of protoclusters de¬ 
creases with time until 2 ~ 1, after which they rapidly col¬ 
lapse. From an observational point of view, protoclusters 
extend over approximately the same angle across the sky 
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log[M*/(/i-iMo)] 


Figure 3. The 3rd nearest neighbour galaxy density of proto¬ 
cluster galaxies (red dashed line) relative to all galaxies (solid 
black line), as a function of stellar mass. High mass galaxies show 
similar environments due to most of them being in protoclusters, 
however low mass galaxies diverge. 


from z ^ 5 to z ^ 1. This means that protocluster detection 
algorithms do not need to search over different sized aper¬ 
tures to locate protoclusters at different redshifts: a single 
hxed angular aperture will suffice. However the size of the 
aperture could be adjusted to select different mass protoclus¬ 
ters. Additionally, the large spatial extent of protoclusters 
means that using off-centre galaxies as a field sample may 
not produce a clean sample. As emphasised in Figure [T1 the 
complex structure of protoclusters means they can be very 
extended in one direction. Therefore, even at large radii from 
the protocluster core there may be dense regions of proto¬ 
cluster galaxies. To be certain of having a clean field sample 
for comparisons, field galaxies should be selected from re¬ 
gions more than 20 arcmin from the protocluster core. This 
conclusion correlates with low redshift theoretical and ob¬ 
servational studies of clusters that have also emphasised the 
importa nce of selecting field samples far away from the clus¬ 
ter (e.g. iBahe et al.l[2013l : iHaines et al.ll20ldf ). 

The next generation of large galaxy surveys, such as the 
Large Synoptic Survey Telescope (LSST) and Euclid, have 
the potential to locate many protoclusters and high redshift 
clusters. Existing techniques used to detect clusters at low 
redshift, such as X-ray identification or using the red se¬ 
quence, are not suitable for protoclusters as they are not 
evolved enough to possess these properties. However, one 
method of detecting protoclusters that would be suitable 
is to use an environment measure to identify overdensities. 
In Figure [3] we plot the the third nearest neighbour density 
against galaxy stellar mass for all galaxies in the Millennium 
Simulation and those we have defined as protocluster mem¬ 
bers (any galaxy that will merge into the z — 0 clusters). 
The environment is characterised using <5: 



log[M*/(h-iMo)] 


Figure 4. The fraction of 2 : = 2 galaxies within a given comoving 
volume, centred on the largest protocluster halo, that are proto¬ 
cluster members. Apertures are defined as the side length of a 
cube. Small apertures that are typical of Ho narrow-band imag¬ 
ing produce low contamination. 


5 = = ( 1 ) 

p p 

where p is the galaxy density and p is the average density of 
all galaxies. As expected, there is a clear trend for massive 
galaxies to reside in denser environments. For very massive 
galaxies there is little difference between the environments 
occupied by protocluster galaxies and all galaxies. This im¬ 
plies that most massive galaxies at high redshift reside in 
protoclusters. For lower mass galaxies, the two curves di¬ 
verge showing that the 3rd nearest neighbour density mea¬ 
sure can pick out the protocluster overdensity relative to the 
held for all masses. This means that measuring the environ¬ 
ment of low mass galaxies around high mass galaxies offers 
the opportunity to locate protoclusters in large photometric 
redshift surveys. Measuri ng accurate environ ments is more 
difficult at high redshift dShattow et al.l[2013h and the abil¬ 
ity to accurately detect protoclusters using this method will 
be explored in future work. 

Finally, we look at the level of contamination associated 
with the size of the aperture. As we have seen, the limita¬ 
tions due to instrumental helds-of-view mean that we are 
limited to small apertures which do not capture the full pro¬ 
tocluster. Using a larger aperture would capture the whole 
protocluster, but would introduce a higher level of contam¬ 
ination from non-protocluster members. 

In Figure |4] we plot the fraction of galaxies that are 
protocluster members for different masses within different 
sized apertures. Apertures are defined as the side length of 
a cube. For all apertures there is little contamination at the 
very high mass end reaffirming our conclusion that most 
high mass galaxies are in protoclusters. This also reaffirms 
the previous conclusion that in order to attain a clean sam- 
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Figure 5. The evolution of halo mass for clusters binned by 2 = 
0 mass. The lines shows the fraction of mass in the main halo 
(solid line) and all haloes that will merge to make the final cluster 
(dashed line) relative to the 2 = 0 mass of the halo for different 
2 = 0 mass clusters. Most of the protocluster mass is spread 
amongst many haloes and not concentrated in the main halo. 
Error bars represent 1 a scatter and are offset about the middle 
mass bin by <52 = 0.05 for clarity. 


pie of field galaxies it is important to search further than 20 
arcmin from the protocluster, as many galaxies, especially 
those with M* > closer than this are likely to 

be protocluster members. For small apertures the level of 
contamination is low at all masses reflecting the fact that 
only a small amount of the protocluster is being detected. 
For large apertures the contamination is significant for low 
mass galaxies and by 50 /i“^Mpc it is the same as randomly 
sampling the Universe. This implies that small apertures 
(< 10/i“^Mpc) are required to produce a clean sample. If 
the protocluster galaxies are defined as all those that enter 
the virial radius by 2 = 0, all of these purity fractions de¬ 
crease by approximately a few to 10 per cent. Furthermore, 
the contamination levels presented here are lower limits due 
to the use of cubes. In reality the 2 dimension would be 
significantly larger than the other two dimensions, in most 
cases, due to redshift uncertainties. This would increase the 
level of contamination. 


3.2 The relationship between a protocluster and 
its main halo 

In Figure[T]we showed that the protocluster environment is 
a complex “clumpy” structure. We define the main halo as 
being the most massive progenitor halo in the protocluster 
at a given redshift. If the main halo is massive enough it will 
be observed as a high redshift cluster. 



2 


Figure 6. The fraction of galaxies in the main halo compared 
to the full protocluster with redshift. Solid lines are for galax¬ 
ies with stellar mass M* > 10 ®/i“^Mq, while dashed lines are 
for Mt > Colours correspond to different cluster 

masses binned by 2 = 0 mass. Galaxies of higher mass are more 
likely to be found in the main halo. 

3.2.1 Main halo versus protocluster mass 

To explore how the growth of the main halo relates to the 
protocluster as a whole, in Figure [5] we plot the mass as¬ 
sembly of both. The solid lines correspond to the fraction 
of the 2 = 0 halo mass that is present in the largest pro¬ 
genitor halo with redshift. The dashed lines correspond to 
the fraction of mass in all haloes hosting a galaxy of at 
least M* = 10® at that redshift, that will merge to 

form the final cluster. The assembly plot has been grouped 
by final mass and, as expected from hierarchical growth, 
high mass clusters build up their mass later, although this 
is within the 1 a error. 

The rate of mass growth of the main halo differs from 
that of the whole protocluster. The growth of the main halo 
is slower than the rest of the haloes in the protocluster at 
2 > 2, but at 2 < 2 the main halo grows more rapidly. This 
is a manifestation of the hierarchical growth of dark matter 
structures. In the early Universe there is a rapid increase in 
the number of small dark matter haloes in the protoclus¬ 
ter that become large enough to host a galaxy. Also, at low 
redshift (2 < 0.5), there are significantly fewer dark mat¬ 
ter haloes remaining in the protocluster which have not yet 
merged with the main halo. 

Figure [ 5 ] illustrates that less than ~ 20 per cent of the 
protoclusters mass is in the main halo at high redshift (2 > 
2). For 2 = 2 the main halo contains only ~ 10 per cent of 
the 2 = 0 cluster’s mass for a massive halo. This means that 
studying only the main progenitor ignores a vast amount of 
information about the forming cluster. 

Figure [ 5 ] considered the growth of the dark matter halo. 
We now look at the galaxies that reside in the main halo 
compared to the protocluster by plotting the fraction of 
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Figure 7. The mass of the most massive progenitor halo (Mi) at 
z = 2 for a given z = 0 cluster mass. Points are coloured by the 
ratio of the mass of second most massive progenitor (M 2 ) to the 
first most massive (Mi) at 2 : = 2. The solid black line gives the 
best fit to the data with dashed lines representing the 1 a scatter 
about this. 

galaxies in the main halo in Figure [6l These fractions are 
determined using two different stellar mass cuts. As the red- 
shift decreases the fraction of protocluster galaxies in the 
main halo increases for M, > 10® This is expected 

as merging with the main halo brings galaxies that were 
residing outside of it in. 

For a higher stellar mass cut of M* > 10^® 
however, there is an unexpectedly different trend. As the 
redshift decreases to 2 = 3, the fraction of galaxies in the 
main halo decreases before increasing again after this point. 
Massive galaxies cannot leave the main halo, therefore this 
effect can only occur by galaxies outside of the main halo 
gaining enough mass to enter the sample. These trends are 
apparent for all mass protoclusters. Additionally, the most 
massive halo of higher mass protoclusters is less significant 
as it hosts a smaller fraction of the total galaxies. 

Regardless of cluster mass, no main halo hosts more 
than 30 per cent of protocluster galaxies with M, > 
10® at 2 > 2. Observations that only study galax¬ 

ies within the main progenitor halo (i.e. the high redshift 
cluster) miss the majority of cluster galaxy progenitors. To 
trace the evolution of cluster galaxies it is essential that a 
representative fraction of the protocluster is observed. 

3.2.2 Estimating the 2 = 0 cluster mass from the main 
halo mass at high redshift 

Having shown that the main halo contains only a fraction 
of the mass and galaxies of the protocluster, we now explore 
how much the structure of the protocluster can reveal about 
its evolutionary state, and the mass of the cluster it will 
become by 2 = 0. The evolutionary state of a protocluster 



Figure 8. Same as Figured but this time only protoclusters with 
a mass ratio of the second most massive progenitor to the first 
most massive (M 2 /M 1 ), at 2 = 2, greater than 0.85 (red) and 
less than 0.15 (blue) are shown. The solid lines give the best fit 
to the data with dashed lines representing the 1 cr scatter about 
this. The black dot-dashed line represents the fit to all M 2 /M 1 
values. 


is described by the fraction of matter already located within 
the main protocluster halo. Protoclusters that contain a high 
fraction of their mass within the main halo are defined as 
further evolved. 

Figure[7]plots the mass of the main halo at 2 = 2 against 
the mass of the cluster at 2 = 0. The median best fit to the 
data and 1 a deviation are shown by the black solid and 
dashed lines. These are determined by bootstrap sampling 
100,000 times the least-squares fit. There is a clear correla¬ 
tion between the mass of the main progenitor halo at 2 = 2 
and the mass of the resultant 2 = 0 cluster: more massive 
progenitors tend to evolve into more massive clusters. How¬ 
ever, for a given 2 = 0 cluster mass, there is a large scatter 
in the range of masses for the main halo of the protocluster 
at 2 = 2. This means that protoclusters exist in a range of 
evolutionary states at high redshift, which is nearly indepen¬ 
dent of the mass they will grow to by 2 = 0. Thus estimating 
the 2 = 0 mass of a cluster by extrapolating the mass of the 
high redshift cluster should be considered highly uncertain 
due to variation in the accretion history during cluster for¬ 
mation. 

If more than one progenitor halo can be identified, and 
its mass measured, the accuracy and precision of the extrap¬ 
olation will increase. Each point in Figure [7]is colour coded 
to indicate the ratio of the mass of the second most mas¬ 
sive halo in the protocluster (M 2 ) to the most massive (Mi) 
at 2 = 2, i.e. an indication of the dominance of the main 
halo in the protocluster. Clusters of all mass have a huge 
range in this ratio at higher redshift indicating the stochas¬ 
tic nature of cluster formation. The scatter in the relation 
between the cluster’s mass at 2 = 2 and 2 = 0 separates 
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log[M|(,oO/(/i ^M©)] log[Mprog/(/i ^M©)] 

Figure 9. The fraction of protoclusters at 2 = 2 where the second most massive halo is greater than 0.8 (black solid), 0.6 (red dashed), 
0.4 (blue dot-dashed) or 0.2 (magenta solid) times the mass of the most massive halo. The left panel presents this against the 2 = 0 
cluster mass while the right panel is against the most massive halo in the protocluster. 


into clear bands of protoclusters at different evolutionary 
states at that redshift. Therefore the mass ratio of the two 
most massive protocluster haloes (Mi and M 2 ) provides an 
approximation of the evolutionary state of the protocluster. 

Figure [S] shows two of these bands: protoclusters with 
dominant main haloes (blue points; M 2 /Mi < 0.15) and 
protoclusters in which there is no single dominant halo (red 
points; M 2 /Mi > 0.85). The solid lines represent the median 
best fit to these data, with dashed lines showing the 1 cr scat¬ 
ter about these lines. Both are again obtained by bootstrap 
sampling 100,000 times the least-squares fit. Clusters with a 
single, dominant progenitor halo (blue points) tend to have 
larger 2 = 2 masses than those with higher M 2 /Mi ratios 
(red points) and correlate more strongly with 2 = 0 cluster 
mass. 

Extrapolating the main halo mass to the 2 = 0 cluster 
mass, whilst taking into account the mass ratio of the two 
most massive haloes within the protocluster, will not only 
improve the precision but also the accuracy. For protoclus¬ 
ters with M 2 /Ml < 0.15, the scatter in the mass of the main 
halo at 2 = 2 for a given 2 = 0 cluster mass halves. Addi¬ 
tionally, the accuracy of this measurement increases, which 
can be quantified by considering the RMS of the difference 
between the predicted and true 2 = 0 mass cluster. For pro- 
toclusters with M 2 /Mi < 0.15, the estimate of the 2 = 0 
mass made without any information about the M 2 /Mi ra¬ 
tio (i.e. using the black dot-dashed £t) would have a 0.54 dex 
RMS deviation from the true mass, however this decreases 
to 0.15 dex when the ratio is taken into account (using the 
blue £t). Observational studies which wish to estimate the 
2 = 0 mass of a protocluster may therefore improve the 
accuracy and precision of their estimate by measuring the 
mass of the two most massive haloes in the protocluster. 

The masses of the two most massive haloes within a 
protocluster can be measured observationally in a number 


of ways. Galaxy velocity dispersions can give an estimate of 
the dynamical mass under the assumption that the galaxies 
are in virial equil i brium . Such a method has been used by 
IShimakawa et aP (l2014l ) to estimate the mass of two groups 
in a 2 = 2.53 protocluster, finding a ratio of 0.1. Alter¬ 
natively, the mass of the haloes can be measured through 
observations of the intracluster gas using sensitive X-ray 
observatories, or at submilimetre wavelengths to detect the 
Sunyaev-Zel’dovich (SZ) decrement. Current instrumenta¬ 
tion is only able to measure collapsed struct ures at 2 > 1.5 
with masses greater than 10^^'^ M© (e.g. Stanford et al.l 
I2OI2I : iBrodwin et al.l[2012l : lAndreon et al.ll2014l ). but forth¬ 
coming instrumentation, such as the ESA Athena satellite 
and the full ALMA array, will be able to detect much smaller 
groups at high redshift. If the ratio of M 2 /Mi is sufficiently 
small, then it may not be possible to measure the mass of 
M 2 directly through X-rays or the SZ decrement. In this case 
it may be sufficient to measure the mass of Mi through a 
detection of the intracluster medium and estimate the mass 
of M 2 through the ratio of stellar mass enclosed in M 2 and 
Ml. The stellar mass is a good tracer of the total cluster 
mass at low and intermediate redshifts (iMulrov et al.l[^14l . 
Ziparo et al. in prep.) and the next generation of accurate 
cosmological simulations will help determine the relation¬ 
ship between the stellar mass and the total mass within 
more distant groups. 


3.2.3 The importance of the main halo within a 
protocluster 

We further explore the scatter in cluster formation history 
by showing the fraction of protoclusters where the mass ratio 
of the second to the first most massive halo is more than 
a given value. The left panel of Figure shows that the 
second most massive halo is at least 80 per cent of the mass 
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Figure 10. The effect of centring on the stellar mass enclosed 
in a cubical aperture of side length l.l/i~^Mpc (solid line), 
2.8/i“^Mpc (dashed) and 7.7/r“^Mpc (dot-dashed). Each curve 
gives the total stellar mass enclosed when the nth most massive 
galaxy in the protocluster is chosen as the centre with respect 
to the most massive. For large apertures the difference in stellar 
mass is small however the scatter between protoclusters is large. 
Error bars represent the 1 a scatter and are offset for clarity. 


of the most massive halo in ~ 20 per cent of protoclusters. 
Only 10 per cent of protoclusters have a dominant main halo 
where the next largest is less than 20 per cent. These results 
are independent of the z = 0 mass of the cluster except 
in the largest mass bin where there are few objects, which 
means the accretion history of clusters is erratic for clusters 
of all mass. The right panel of Figure [5] shows the same mass 
ratios, but this time against the mass of the main halo at 
z = 2. A clear mass dependence can be seen with large main 
haloes significantly less likely to have a massive companion 
in the protocluster. 

Overall, these Figures illustrate that the largest halo in 
many protoclusters should not be considered the dominant 
halo as it is often not significantly larger than other haloes. 
These results also suggest that the observed examples of 
main halo dominated protoclusters is the result of poten¬ 
tial bias in the observations. Larger main haloes are easier 
to detect using current cluster-finding techniques (e.g. red 
sequence algorithms, X-ray or SZ detections). Since it is eas¬ 
ier to locate high redshift clusters with massive first ranked 
haloes, this subsample of evolved protoclusters will domi¬ 
nate observations. However, a significant fraction of mas¬ 
sive clusters at « = 0, do not yet have massive dominant 
haloes during the protocluster stage at high redshift. These 
less-evolved cluster progenitors would be missed by surveys 
searching for high redshift clusters, which target the most 
massive objects, and yet, they are equally likely to evolve 
into massive clusters hy z = 0. It is possible that the ac¬ 
cretion history of a cluster leaves a lasting trace on its gas 
properties and the distribution of its galaxies. To trace the 



Figure 11. The galaxy stellar mass function for all galaxies 
within the simulation at z = 2 (black solid line), those tagged as 
protocluster galaxies (red dashed line) and those tagged as field 
galaxies (i.e. not protoclusters; blue dot-dashed line). The proto¬ 
cluster mass function has more massive galaxies and a shallower 
low mass slope. 


different evolutionary paths taken by collapsing clusters we 
must search for all types of cluster progenitors. 

If the whole of the protocluster cannot be viewed, and 
it is not clear if there is a main halo, it is important to see 
if the choice of the image centre has an effect on the results 
obtained. For protoclusters selected due to the presence of a 
radio-loud AGN (such as the Clusters Around Radio-Loud 
AGN (CARLA) Survey: IWvlezalek et al.ll2013f) . often the 
centre of the protocluster is assumed to be the position of the 
radio-loud AGN, which is typically one of the most massive 
galaxies in the protocluster. 

In Figure [TO] we present the stellar mass enclosed by var¬ 
ious sized apertures, centred on the 10 most massive galaxies 
in the protocluster at 2 = 2, with respect to that centred on 
the most massive. For an aperture of 7.7 /i“^Mpc (7 arcmin) 
the difference in enclosed mass between an aperture centred 
on the 1st and 10th most massive galaxy is small, dropping 
to 94 per cent of the 1st most massive. For the smallest aper¬ 
ture tested of l.l/i“^Mpc (1 arcmin), however, the drop is 
significantly larger, decreasing to 56 per cent mass of the 
one centred on the most massive galaxy. An analogous cal¬ 
culation can be made for enclosed star formation rate which 
shows a smaller change. 

The most noticeable part of Figure [10] is the huge scat¬ 
ter between protoclusters. For the smallest aperture tested, 
the mass enclosed by an aperture centred on the 10th most 
massive galaxy can vary from a decrease by a factor of 0.03 
to an increase by a factor of 5 on that of an aperture cen¬ 
tred on the most massive galaxy. This large variation means 
there is a large uncertainty associated with the observed 
mass and star formation rate of protoclusters if the main 
halo cannot be readily identified. Observing large samples 
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log[M./(fe-iMo)] 


Figure 12. The fraction of star forming galaxies as a function 
of mass using two different star f ormation cuts. The solid lines 
correspond to a fixed cut in SFR llCooke et al.ll2014[). while th e 
dashed line corresponds to a fixed cut in sSFR lILani et al ][20il . 
A fixed SFR cut, such as that in Ho narrow-band observations, 
can cause the artificial loss of star forming low mass galaxies. 

of protoclusters removes much of this uncertainty, therefore 
large statistical studies of protoclusters are less affected by 
this issue. 

3.3 Biases introduced by observing only 
star-forming protocluster galaxies 

A very common technique to locate a clean sample of pro¬ 
tocluster galaxies is to select line-emitting galaxies, such 
as Ha, Lyo and [On ] emitters, using nar row filters (e.g. 
IVenemans et al.ll2007l : iKovama et al.l 12^3 1. However, this 
method is only able to identify the active subset of proto¬ 
cluster galaxies. Here we explore how our interpretation of 
the galaxy stellar mass function and overdensity of the pro¬ 
tocluster can be affected by only studying the star forming 
galaxy population. 

In Figure [11] we plot the galaxy stellar mass function 
for protocluster members (red dashed line), field galaxies 
(non-protocluster members; blue dot-dashed line) and all 
galaxies (the sum of the two; black solid line) at z = 2. 
The shape of the protocluster mass function compared to 
the field differs slightly. There are more massive galaxies in 
protoclusters than the field, despite there being more galax¬ 
ies in general in the field. At the low mass end, the slope 
of the protocluster mass function is shallower than that of 
the field. The value for the turnover (M*) is fractionally 
higher for the protocluster. The shallower low mass slope in 
the semi-analytic model protoclusters reduces the number of 
expected low mass galaxies compared to the field, but not 
greatly. 

Conventional definitions of star forming and non-star 
forming galaxies involve cuts in Specific Star Formation Rate 



log[M*/(h-iM0)] 


Figure 13. The fraction of star forming galaxies with respect to 
environment using a fixed SFR cut at 2 = 2. The black solid line 
corresponds to field galaxies, whilst the red solid line to proto¬ 
cluster galaxies. The blue dashed and dot-dashed lines correspond 
to just galaxies within a 2.8h“^Mpc comoving (2.5 arcmin) and 
l.lh“^Mpc comoving (1 arcmin) cube centred on the most mas¬ 
sive protocluster galaxy respectively. The lar ger of these is similar 
to the aperture used bv ICooke et al.l ll20l4) and shows a strong 
environmental relation with the number of star-forming galaxies 
observed. 

i sSFR ; SFR/M*). One such definition, used in iLani et al.l 
2 QI 3 I) for example, is to define a galaxy as non-star forming 
if its mass doubling time, calculated from its present star 
formation rate, is more than the age of the Universe. Apply¬ 
ing this to our simulated 2 = 2 galaxy sample yields a star 
forming galaxy fraction corresponding to the dashed line in 
Figure [T^ This demonstrates that there are fewer star form¬ 
ing galaxies in the protocluster than the field, but in general 
they follow the same trend with mass. 

Imaging in a narrow-band to detect emission line galax¬ 
ies does not select galaxies based on their sSFR, but instead 
produces a cut in SFR. This can have a different effect, es¬ 
pecially at the low mass end, as a galaxy’s SFR is depen¬ 
dent on its mass if it is on the main star forming sequence. 
Applying a cut o f 7M©yr“^ (typical of recent works, e.g. 
ICooke et al.ll20l3l . gives a star forming fraction that corre¬ 
sponds to the solid lines in Figure 1121 This produces very 
different trend to that of the sSFR cut. At low masses the 
star forming fraction rapidly descends to zero as the cut 
intercepts the star forming main sequence. This leads to a 
minimum detectable mass for emission-line selected galax¬ 
ies such as those observed in Ha narrow-band images. Thus 
selecting galaxies by their star formation rate biases against 
low-mass galaxies. 

The limitations of instrumental field of view mean that 
only a small fraction of the protocluster is typically ob¬ 
served. If we consider only the central region of the pro¬ 
tocluster, rather than all members, we get a further bias in 
the results. Figure [13] replots the star forming fraction of 
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protocluster and field galaxies with a 7MQyr“^ cut (solid 
lines), but only including the star forming galaxies within 
the central 2.8h“^Mpc comoving (2.5 arcmin; blue dashed 
line) and l.lh~^Mpc comoving (1 arcmin; blue dot-dashed 
line) regions of the protocluster. Using the small window 
only captures the densest region where quenching is efficient 
and has led to a much lower fraction of star forming galaxies 
relative to the field. 

By observing only the star forming galaxies in the main 
halo of the protocluster we obtain a very biased view of 
the mass function of protocluster galaxies. Having a fixed 
threshold intercepts the star forming main sequence result¬ 
ing in the suppression of galaxies detected below 10^° M© 
and a near total loss of galaxies below 10® Mq . In addition to 
the loss of galaxies because they drop below the star forma¬ 
tion rate threshold, the small windows used for narrow-band 
observations results in a further loss. Focussing on just the 
very centre, as opposed to the full protocluster, significantly 
increases the quenched fraction of galaxies. This is because 
more environmental quenching occurs within the densest 
part of the protocluster. While larger apertures would reveal 
more of the protocluster, it would also increase the level of 
contamination of non-protocluster members. For small aper¬ 
tures the sample has little contamination, but by 10 h“^Mpc 
at least 20 per cent of low mass galaxies are interlopers. 

An important side effect of losing low mass galaxies in 
narrow-band observations is that the measured overdensities 
for protocluster will be highly uncertain. If the full observed 
sample of galaxies is used, then the absence of low mass 
galaxies in the protocluster compared with the field will lead 
to the overdensity being underestimated. Using a mass cut 
however is also problematic. The simulations indicate that 
almost all very massive galaxies reside in protoclusters and 
this will lead to an unrepresentative field sample, leading to a 
very high overdensity estimate. Quantifying the overdensity 
accurately is important for estimating the even tual mass 
of the protocluster using the IChiang et al.l (l2013l l method. 
Due to the above reasons, it is not advisable to estimate the 
mass of a protocluster from the overdensity measured from 
the excess of emission line galaxies in a small field of view. 


4 IMPLICATIONS FOR OBSERVATIONS 

We have explored the difference between protoclusters and 
high redshift clusters using a semi-analytic model applied to 
the Millennium Simulation. Clusters were identified as z = 0 
haloes with masses greater than or equal to 10^^ All 

galaxies that will merge to make these clusters were tagged 
at higher redshift and classed as protocluster members. The 
most massive virialised dark matter halo in the protocluster 
is defined as the main halo, and would be observed as a high 
redshift cluster or group if it were massive enough. 

We find that protoclusters are very extended, with 90 
per cent of the mass spread over ~ 35 /i~^Mpc comoving at 
z = 2 (ll/i~^Mpc physical; 30 arcmin). This is far larger 
than the typical targeted observations of protoclusters be¬ 
ing currently conducted using line-emitting galaxies. This 
implies that these studies of protoclusters and high redshift 
clusters are not imaging all of the protocluster, but instead 
are focussed on only a small part of the structure. 

The protocluster structure comprises many haloes 


linked by filaments. This has important consequences for 
the evolution of cluster galaxies, since not all galaxies that 
make up the cluster at 2 = 0 have had the same environmen¬ 
tal history. Some will have formed in the main halo, others 
will have been residing in smaller haloes or in filaments for 
much of their history. Thus the environmental history of 
cluster galaxies is complex and non-uniform. Some galaxies 
experience strong ‘environment preprocessing’, where galax¬ 
ies experience environment al effects prior to clu ster infall, 
whereas others do not le.g. iDe Lucia et al.ll2012 ). 

We find that the largest halo of the protocluster only 
hosts a minority of protocluster galaxies at high redshift, 
with typically less than 20 per cent of galaxies with M* > 
10® residing within it at 2 > 2. To study the evo¬ 

lution of cluster galaxies it is therefore essential that a 
representative fraction of the protocluster is observed, and 
not simply the minority of protocluster galaxies that reside 
within the high redshift cluster core. Whilst this will im¬ 
prove our understanding of the role of preprocessing, it does 
come at the expense of sample purity. 

We have shown that only a small subset of protoclusters 
evolve as a single main halo with significantly smaller objects 
merging onto it. Only 10 per cent of protoclusters at 2 = 2 
are dominated by a single halo, i.e. where no other member 
haloes in the protocluster have more than 20 per cent of the 
main halo’s mass. A fifth of protoclusters exhibit very little 
difference between the most massive and second-ranked halo 
as the mass ratio is > 0.8. Whether a protocluster contains a 
dominant halo at high redshift does not depend on its 2 = 0 
mass, however, if the first-ranked halo is very massive (so it 
would be detected as a high redshift group or cluster), then 
it is likely to be a very dominant halo. Observational tech¬ 
niques that are predisposed to locate protoclusters based on 
the mass of their main halo (e.g. X-ray or SZ detection) are 
biased to select the subset of protoclusters with single dom¬ 
inant haloes, and therefore are likely to miss the majority of 
cluster progenitors with no dominant halo. 

Having many large haloes in the same protocluster will 
additionally have important consequences for cluster cos¬ 
mology. The close proximity of large haloes in protoclusters 
will make it difficult to separate them observationally. This 
may result in haloes being classed as a single more massive 
object and hence discrepant with the output of dark matter 
simulations. 

For over a decade studies of protoclusters have used 
narrow filters to isolate and study star-forming protoclus¬ 
ter galaxies. This technique is popular as it efficiently se¬ 
lects a relatively clean sample of protocluster galaxies. How¬ 
ever, several recent observational studies have shown that 


the stellar mass function of star-formin 
clusters differs from that of the field 

g galaxies in proto- 

ISteidel et al. 

2005 

iHatch et al.l 1201 Ibl: iKovama et al.ll2013l 

1 Cooke et al. 

2014 


Husband et al. in prep). This means that the mass func¬ 
tion of star-forming galaxies in protoclusters is no longer a 
scaled version of the field, and hence implies that the bias 
of this population depends on environment. This has severe 
implications for measuring the mass overdensity: the mea¬ 
sured galaxy overdensity may not be correctly converted to 
a mass overdensity. 

The semi-analytic model we have investigated suggests 
the observed difference in the stellar mass functions is due to 
environmental quenching of low mass star-forming galaxies. 
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This effect is exacerbated if the observations are concen¬ 
trated on the main halo where environmental quenching is 
strongest (Figure 13). Observations taken with larger fields- 
of-view (greater than 10 comoving Mpc) will not be strongly 
impacted by these environmental effects, and thus the bi¬ 
ases of the field and protocluster emiss ion line galaxies wi ll 
be similar on large scales (as shown bv IChiang et al.ll2013ll . 
However, if the model prescription of the quenching is too 
aggressive, the cause of the observed mass function diver¬ 
gence may extend beyond the main halo, and impact mass 
overdensities determined even from large apertures. Future 
observations of the star-forming galaxy mass function on 
larger scales is needed to test the environment quenching 
scenario. In summary, the mass overdensity measured from 
the excess of emission line galaxies should be considered un¬ 
reliable, especially in small apertures, and should not be 
used to estimate the z = 0 mass of the protocluster. 


5 CONCLUSIONS 

As highlighted at the start of this paper, the term proto¬ 
cluster is used to describe the progenitors of galaxy clusters, 
but differing definitions are used in the literature. Proto¬ 
clusters are diffuse collections of haloes, linked by filaments, 
that will merge to make up the final low redshift clusters. 
These structures are very extended, with 90 per cent of the 
mass spread over ~ 15 — 35h“^Mpc comoving at 2 ; = 2, 
with the radial extent depending on the final mass of the 
cluster. High redshift clusters are the manifestations of mas¬ 
sive main haloes within protoclusters. However in most cases 
the largest halo of the protocluster only hosts a minority of 
protocluster galaxies at high redshift, so a representative 
fraction of the protocluster must be observed to study the 
evolution of cluster galaxies. 

Protoclusters exist in a range of evolutionary states at 
high redshift, independent of the mass they will evolve to by 
2 = 0. Here we define evolution by the amount of 2 = 0 clus¬ 
ter mass in the main halo. Only a small subset of protoclus¬ 
ters host a dominant main halo that would be identifiable 
as a high redshift cluster. The evolutionary state of a proto¬ 
cluster can be approximated from the mass ratio of the first 
and second ranked haloes in the protocluster. Furthermore, 
a more accurate estimate of the mass of the 2 = 0 descen¬ 
dant cluster can be determined if both the main halo mass 
and the evolutionary state of the protocluster are known. 

Large observations spanning several arcmin are required 
to view all the different physical processes that affect galax¬ 
ies within forming clusters. The assembly history of clusters 
is varied and we must examine protoclusters both with and 
without dominant main haloes to understand the numerous 
paths by which clusters of galaxies form. Future large scale 
observations of protoclusters will offer the opportunity to 
better understand both cluster formation, and the impor¬ 
tance of environment history in galaxy evolution. 
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APPENDIX A: CLUSTER SIZE DEFINITION 

Within this paper we have defined protocluster member 
galaxies as being any galaxy that merges and forms part 
of the friends-of-friends halo of a 2 ; = 0 cluster. An alterna¬ 
tive definition would be to consider only those galaxies that 
reside within the virial radius of the cluster at 2 = 0. In Fig- 
ure lAll we reproduce Figure[2l this time using only galaxies 
that will be within the virial radius at 2 = 0. Using this 
definition results in smaller sizes, but the same evolutionary 
pattern is still present. The choice of cluster definition will 
result in small changes to the absolute values quoted in this 
paper, but the overall results will remain the same. 
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Figure Al. The average radius that encloses 90 per cent of the stellar mass of a protocluster at different redshift, for binned z = 0 
cluster masses. The left panel represents comoving radius, centre panel the physical radius and right panel the angular projection. Error 
bars represent 1 cr scatter and are offset about the middle mass bin by 5z = 0.05 for clarity. Only galaxies that will be within the virial 
radius at 2 = 0 are considered. 
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